Convex Optimization: The Fundamentals of Norm Approximation

Imagine you are a tailor trying to fit a standard suit (the range of $A$) to a client with unique proportions (the vector $b$). No matter how you adjust the sleeve or waist (the coefficients $x$), the suit will never be a perfect skin-tight match. You are looking for the "best" compromise—a norm approximation that minimizes the tension or "residual" at every seam.

The Mathematical Framework

The core objective is to find a vector $x \in \mathbb{R}^n$ such that the linear combination $Ax = x_1a_1 + \dots + x_na_n$ best approximates $b$. This is frequently referred to as the regression of $b$ onto the regressors (the columns of $A$).

We focus on the residual vector $r = Ax - b$. In practice, we assume an overdetermined system where $m > n$. Why? Because when $m = n$ and $A$ is non-singular, the optimal point is simply $A^{-1}b$, yielding zero error—a trivial case for optimization.

🎯 Core Principle

The norm approximation problem (6.1) is a convex problem and is guaranteed to be solvable. There is always at least one optimal solution $\hat{x}$ that minimizes the distance between the target and the achievable subspace.

Canonical Variations

Depending on the "flavor" of error we want to penalize, we choose different norms:

1. Least-Squares ($\ell_2$ Norm)

The most common approach. It minimizes the sum of the squares of the residuals: $\|Ax - b\|_2^2$. It is sensitive to large outliers but offers an analytical solution via the normal equations.

2. Chebyshev / Minimax ($\ell_\infty$ Norm)

Minimizes the maximum absolute residual $\max_i |r_i|$. This is used when every single measurement must stay within a strict tolerance. It can be solved via the following Linear Program (LP):

minimize $t$
subject to $-t\mathbf{1} \preceq Ax - b \preceq t\mathbf{1}$

3. Sum of Absolute Residuals ($\ell_1$ Norm)

Minimizes $\sum |r_i|$. This approach is robust to outliers, as it doesn't square the errors. It is also solvable via an LP:

minimize $\mathbf{1}^T t$
subject to $-t \preceq Ax - b \preceq t$

Estimation Context

In many engineering fields, we assume a true state $x$ is obscured by noise: $y = Ax + v$. Our goal is to find an estimate $\hat{x} = \text{argmin}_z \|Az - y\|$. By choosing the norm, we are effectively making an assumption about the statistical distribution of the noise $v$.

\text{Minimize } \|u - b\| \text{ subject to } u \in \mathcal{A} \quad (\text{where } \mathcal{A} = \text{Range}(A))

QUESTION 1

In the context of norm approximation, why do we generally assume that $m > n$?

Because if $m = n$, the solution is the trivial $x = A^{-1}b$ with zero residual.

To ensure the problem remains non-convex.

Because the L1 norm requires more variables than constraints to be solvable.

To guarantee that the matrix A is always singular.

QUESTION 2

Which Linear Programming (LP) formulation correctly represents the Chebyshev (minimax) approximation problem?

minimize t subject to -t1 ⪯ Ax - b ⪯ t1

minimize 1ᵀt subject to -t ⪯ Ax - b ⪯ t

minimize ||Ax - b||₂ subject to x ⪰ 0

minimize t subject to Ax - b = t

QUESTION 3

You are calibrating a sensor and want to ensure that no single measurement ever deviates from the model by more than a fixed amount. Which norm should you use?

L∞ (Chebyshev)

L₁ (Sum of Absolute Residuals)

L₂ (Least Squares)

The Frobenius Norm

QUESTION 4

What is true about the solvability of the norm approximation problem (6.1)?

It is always solvable and convex.

It is only solvable if the matrix A is symmetric.

It is non-convex if the L1 norm is used.

It has no solution if the system is overdetermined.

QUESTION 5

In the expression y = Ax + v, if v represents Laplacian noise (which has 'fatter' tails than Gaussian noise), which approximation norm is statistically more robust?

L₁ (Sum of Absolute Residuals)

L₂ (Least Squares)

L∞ (Chebyshev)

L₀ pseudo-norm